Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompatHelper: bump compat for "CUDA" to "5" #155

Merged

Conversation

github-actions[bot]
Copy link
Contributor

This pull request changes the compat entry for the CUDA package from 3.5, 4 to 3.5, 4, 5.

This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

@maleadt
Copy link
Member

maleadt commented Sep 20, 2023

Benchmark results for commit dedaad6 (comparing to bca8c30):

test master PR Δmin
BLAS WMMA FP16'*FP16'=FP16 (256×256×256, α) 45.6 ± 0.14 μs 36.7 ± 0.1 μs 19.7% ✅
BLAS WMMA FP16'*FP16'=FP16 (4096×4096×4096, α) 5.89 ± 0.035 ms 4.28 ± 0.026 ms 29.2% ✅
BLAS WMMA FP16'*FP16'=FP32 (256×256×256, α) 34.2 ± 0.42 μs 27.96 ± 0.089 μs 15.2% ✅
BLAS WMMA FP16*FP16'=FP16 (4096×4096×4096, α) 5.355 ± 0.0021 ms 4.07 ± 0.019 ms 26.2% ✅
BLAS WMMA FP16*FP16'=FP32 (256×256×256, α) 33.5 ± 0.49 μs 27.99 ± 0.09 μs 14.8% ✅
BLAS WMMA FP16*FP16=FP16 (256×256×256, α, β) 38.1 ± 0.56 μs
16 bytes local mem
37.39 ± 0.077 μs
8 bytes local mem
0.2%
BLAS WMMA FP16*FP16=FP16 (256×256×256, β) 34.5 ± 0.54 μs
153 regs
34.2 ± 0.11 μs
155 regs
-0.2%
BLAS WMMA FP16*FP16=FP16 (4096×4096×4096, α, β) 4.16 ± 0.024 ms
16 bytes local mem
4.11 ± 0.03 ms
8 bytes local mem
3.3%
BLAS WMMA FP16*FP16=FP16 (4096×4096×4096, β) 2.986 ± 0.0021 ms
153 regs
3.034 ± 0.0048 ms
155 regs
-1.4%
BLAS WMMA FP16*FP16=FP32 (256×256×256, α, β) 29.8 ± 0.53 μs
182 regs
29.39 ± 0.083 μs
178 regs
-0.8%
BLAS WMMA FP16*FP16=FP32 (4096×4096×4096, α, β) 4.43 ± 0.07 ms
182 regs
4.81 ± 0.04 ms
178 regs
-3.4%

@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (bca8c30) 32.17% compared to head (dedaad6) 32.17%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #155   +/-   ##
=======================================
  Coverage   32.17%   32.17%           
=======================================
  Files          11       11           
  Lines         889      889           
=======================================
  Hits          286      286           
  Misses        603      603           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maleadt
Copy link
Member

maleadt commented Sep 20, 2023

The change in registers is due to the removal of a quirk in JuliaGPU/CUDA.jl@760c2bd#diff-d1deb39259ffe01ecfe5db7c6c2331af73b81772539dd8ef49076d58fa2529d2, but AFAICT that actually improves performance. I still have to find the culprit of the other slowdowns, but I can't reproduce it locally.

@maleadt maleadt force-pushed the compathelper/new_version/2023-09-20-00-30-53-609-4285891187 branch from 16416fe to 47ee618 Compare September 20, 2023 13:32
@maleadt
Copy link
Member

maleadt commented Sep 20, 2023

Painfully bisected to JuliaGPU/CUDA.jl#2025. Which doesn't make sense, as we manually synchronize the stream, and thus do not rely on non-blocking synchronization. The only point nonblocking synchronization is used, is when allocating the arrays during the set-up of each benchmark. Maybe we're just introducing lots of noise there right before each benchmark?

Getting rid of that seems to greatly improve timings, so hopefully that fixes the issue here.

@maleadt maleadt force-pushed the compathelper/new_version/2023-09-20-00-30-53-609-4285891187 branch from 47ee618 to dedaad6 Compare September 20, 2023 16:41
@maleadt maleadt marked this pull request as ready for review September 20, 2023 17:27
@maleadt maleadt merged commit 67d4844 into master Sep 20, 2023
@maleadt maleadt deleted the compathelper/new_version/2023-09-20-00-30-53-609-4285891187 branch September 20, 2023 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant